{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Check whether an LLM response contains PII (Personally Identifiable Information)\n",
    "\n",
    "**Using the `PIIFilter` validator**\n",
    "\n",
    "This is a simple check that looks for the presence of a few common PII patterns\n",
    "It is not intended to be a comprehensive check for PII and to be a quick check that can be used to filter out responses that are likely to contain PII. It uses the Microsoft Presidio library to check for PII.\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 9,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "\u001b[38;5;2m✔ Download and installation successful\u001b[0m\n",
      "You can now load the package via spacy.load('en_core_web_lg')\n",
      "Installing hub:\u001b[35m/\u001b[0m\u001b[35m/guardrails/\u001b[0m\u001b[95mdetect_pii...\u001b[0m\n",
      "✅Successfully installed guardrails/detect_pii!\n",
      "\n",
      "\n"
     ]
    }
   ],
   "source": [
    "# Install the necessary packages\n",
    "! pip install presidio-analyzer presidio-anonymizer -q\n",
    "! python -m spacy download en_core_web_lg -q\n",
    "\n",
    "! guardrails hub install hub://guardrails/detect_pii --quiet"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Import the guardrails package\n",
    "from guardrails.hub import DetectPII\n",
    "import guardrails as gd\n",
    "from rich import print"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 12,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Create Guard object with this validator\n",
    "# One can specify either pre-defined set of PII or SPI (Sensitive Personal Information) entities by passing in the `pii` or `spi` argument respectively.\n",
    "# It can be passed either durring intialization or later through the metadata argument in parse method.\n",
    "\n",
    "# One can also pass in a list of entities supported by Presidio to the `pii_entities` argument.\n",
    "guard = gd.Guard().use(DetectPII(pii_entities=\"pii\", on_fail=\"fix\"))"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 13,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<pre style=\"white-space:pre;overflow-x:auto;line-height:normal;font-family:Menlo,'DejaVu Sans Mono',consolas,'Courier New',monospace\"><span style=\"color: #800080; text-decoration-color: #800080; font-weight: bold\">ValidationOutcome</span><span style=\"font-weight: bold\">(</span>\n",
       "    <span style=\"color: #808000; text-decoration-color: #808000\">raw_llm_output</span>=<span style=\"color: #008000; text-decoration-color: #008000\">'My email address is demo@lol.com, and my phone number is 1234567890'</span>,\n",
       "    <span style=\"color: #808000; text-decoration-color: #808000\">validated_output</span>=<span style=\"color: #008000; text-decoration-color: #008000\">'My email address is &lt;EMAIL_ADDRESS&gt;, and my phone number is &lt;PHONE_NUMBER&gt;'</span>,\n",
       "    <span style=\"color: #808000; text-decoration-color: #808000\">reask</span>=<span style=\"color: #800080; text-decoration-color: #800080; font-style: italic\">None</span>,\n",
       "    <span style=\"color: #808000; text-decoration-color: #808000\">validation_passed</span>=<span style=\"color: #00ff00; text-decoration-color: #00ff00; font-style: italic\">True</span>,\n",
       "    <span style=\"color: #808000; text-decoration-color: #808000\">error</span>=<span style=\"color: #800080; text-decoration-color: #800080; font-style: italic\">None</span>\n",
       "<span style=\"font-weight: bold\">)</span>\n",
       "</pre>\n"
      ],
      "text/plain": [
       "\u001b[1;35mValidationOutcome\u001b[0m\u001b[1m(\u001b[0m\n",
       "    \u001b[33mraw_llm_output\u001b[0m=\u001b[32m'My email address is demo@lol.com, and my phone number is 1234567890'\u001b[0m,\n",
       "    \u001b[33mvalidated_output\u001b[0m=\u001b[32m'My email address is \u001b[0m\u001b[32m<\u001b[0m\u001b[32mEMAIL_ADDRESS\u001b[0m\u001b[32m>, and my phone number is <PHONE_NUMBER\u001b[0m\u001b[32m>\u001b[0m\u001b[32m'\u001b[0m,\n",
       "    \u001b[33mreask\u001b[0m=\u001b[3;35mNone\u001b[0m,\n",
       "    \u001b[33mvalidation_passed\u001b[0m=\u001b[3;92mTrue\u001b[0m,\n",
       "    \u001b[33merror\u001b[0m=\u001b[3;35mNone\u001b[0m\n",
       "\u001b[1m)\u001b[0m\n"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    }
   ],
   "source": [
    "# Parse the text\n",
    "text = \"My email address is demo@lol.com, and my phone number is 1234567890\"\n",
    "output = guard.parse(\n",
    "    llm_output=text,\n",
    ")\n",
    "\n",
    "# Print the output\n",
    "print(output)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Here, both EMAIL_ADDRESS and PHONE_NUMBER are detected as PII.\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 14,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<pre style=\"white-space:pre;overflow-x:auto;line-height:normal;font-family:Menlo,'DejaVu Sans Mono',consolas,'Courier New',monospace\"><span style=\"color: #800080; text-decoration-color: #800080; font-weight: bold\">ValidationOutcome</span><span style=\"font-weight: bold\">(</span>\n",
       "    <span style=\"color: #808000; text-decoration-color: #808000\">raw_llm_output</span>=<span style=\"color: #008000; text-decoration-color: #008000\">'My email address is demo@lol.com, and my phone number is 1234567890'</span>,\n",
       "    <span style=\"color: #808000; text-decoration-color: #808000\">validated_output</span>=<span style=\"color: #008000; text-decoration-color: #008000\">'My email address is &lt;EMAIL_ADDRESS&gt;, and my phone number is 1234567890'</span>,\n",
       "    <span style=\"color: #808000; text-decoration-color: #808000\">reask</span>=<span style=\"color: #800080; text-decoration-color: #800080; font-style: italic\">None</span>,\n",
       "    <span style=\"color: #808000; text-decoration-color: #808000\">validation_passed</span>=<span style=\"color: #00ff00; text-decoration-color: #00ff00; font-style: italic\">True</span>,\n",
       "    <span style=\"color: #808000; text-decoration-color: #808000\">error</span>=<span style=\"color: #800080; text-decoration-color: #800080; font-style: italic\">None</span>\n",
       "<span style=\"font-weight: bold\">)</span>\n",
       "</pre>\n"
      ],
      "text/plain": [
       "\u001b[1;35mValidationOutcome\u001b[0m\u001b[1m(\u001b[0m\n",
       "    \u001b[33mraw_llm_output\u001b[0m=\u001b[32m'My email address is demo@lol.com, and my phone number is 1234567890'\u001b[0m,\n",
       "    \u001b[33mvalidated_output\u001b[0m=\u001b[32m'My email address is \u001b[0m\u001b[32m<\u001b[0m\u001b[32mEMAIL_ADDRESS\u001b[0m\u001b[32m>\u001b[0m\u001b[32m, and my phone number is 1234567890'\u001b[0m,\n",
       "    \u001b[33mreask\u001b[0m=\u001b[3;35mNone\u001b[0m,\n",
       "    \u001b[33mvalidation_passed\u001b[0m=\u001b[3;92mTrue\u001b[0m,\n",
       "    \u001b[33merror\u001b[0m=\u001b[3;35mNone\u001b[0m\n",
       "\u001b[1m)\u001b[0m\n"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    }
   ],
   "source": [
    "# Let's test with passing through metadata for the same guard object\n",
    "# This will take precendence over the entities passed in during initialization\n",
    "output = guard.parse(\n",
    "    llm_output=text,\n",
    "    metadata={\"pii_entities\": [\"EMAIL_ADDRESS\"]},\n",
    ")\n",
    "\n",
    "# Print the output\n",
    "print(output)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "As you can see here, only EMAIL_ADDRESS is detected as PII, and the PHONE_NUMBER is not detected as PII.\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 15,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Let's try with SPI entities\n",
    "# Create a new guard object\n",
    "guard = gd.Guard().use(DetectPII(pii_entities=\"spi\", on_fail=\"fix\"))"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 16,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<pre style=\"white-space:pre;overflow-x:auto;line-height:normal;font-family:Menlo,'DejaVu Sans Mono',consolas,'Courier New',monospace\"><span style=\"color: #800080; text-decoration-color: #800080; font-weight: bold\">ValidationOutcome</span><span style=\"font-weight: bold\">(</span>\n",
       "    <span style=\"color: #808000; text-decoration-color: #808000\">raw_llm_output</span>=<span style=\"color: #008000; text-decoration-color: #008000\">'My email address is demo@xyz.com, and my account number is 1234789012367654.'</span>,\n",
       "    <span style=\"color: #808000; text-decoration-color: #808000\">validated_output</span>=<span style=\"color: #008000; text-decoration-color: #008000\">'My email address is demo@xyz.com, and my account number is &lt;US_BANK_NUMBER&gt;.'</span>,\n",
       "    <span style=\"color: #808000; text-decoration-color: #808000\">reask</span>=<span style=\"color: #800080; text-decoration-color: #800080; font-style: italic\">None</span>,\n",
       "    <span style=\"color: #808000; text-decoration-color: #808000\">validation_passed</span>=<span style=\"color: #00ff00; text-decoration-color: #00ff00; font-style: italic\">True</span>,\n",
       "    <span style=\"color: #808000; text-decoration-color: #808000\">error</span>=<span style=\"color: #800080; text-decoration-color: #800080; font-style: italic\">None</span>\n",
       "<span style=\"font-weight: bold\">)</span>\n",
       "</pre>\n"
      ],
      "text/plain": [
       "\u001b[1;35mValidationOutcome\u001b[0m\u001b[1m(\u001b[0m\n",
       "    \u001b[33mraw_llm_output\u001b[0m=\u001b[32m'My email address is demo@xyz.com, and my account number is 1234789012367654.'\u001b[0m,\n",
       "    \u001b[33mvalidated_output\u001b[0m=\u001b[32m'My email address is demo@xyz.com, and my account number is \u001b[0m\u001b[32m<\u001b[0m\u001b[32mUS_BANK_NUMBER\u001b[0m\u001b[32m>\u001b[0m\u001b[32m.'\u001b[0m,\n",
       "    \u001b[33mreask\u001b[0m=\u001b[3;35mNone\u001b[0m,\n",
       "    \u001b[33mvalidation_passed\u001b[0m=\u001b[3;92mTrue\u001b[0m,\n",
       "    \u001b[33merror\u001b[0m=\u001b[3;35mNone\u001b[0m\n",
       "\u001b[1m)\u001b[0m\n"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    }
   ],
   "source": [
    "# Parse text\n",
    "text = \"My email address is demo@xyz.com, and my account number is 1234789012367654.\"\n",
    "\n",
    "output = guard.parse(\n",
    "    llm_output=text,\n",
    ")\n",
    "\n",
    "# Print the output\n",
    "print(output)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Here, only the US_BANK_NUMBER is detected as PII, as specified in the \"spi\" entities. Refer to the documentation for more information on the \"pii\" and \"spi\" entities. Obviosuly, you can pass in any [Presidio-supported entities](https://microsoft.github.io/presidio/supported_entities/) through the metadata.\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 17,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<pre style=\"white-space:pre;overflow-x:auto;line-height:normal;font-family:Menlo,'DejaVu Sans Mono',consolas,'Courier New',monospace\"><span style=\"color: #800080; text-decoration-color: #800080; font-weight: bold\">ValidationOutcome</span><span style=\"font-weight: bold\">(</span>\n",
       "    <span style=\"color: #808000; text-decoration-color: #808000\">raw_llm_output</span>=<span style=\"color: #008000; text-decoration-color: #008000\">\"My ITIN is 923756789 and my driver's license number is 87651239\"</span>,\n",
       "    <span style=\"color: #808000; text-decoration-color: #808000\">validated_output</span>=<span style=\"color: #008000; text-decoration-color: #008000\">\"My ITIN is &lt;US_ITIN&gt; and my driver's license number is &lt;US_DRIVER_LICENSE&gt;\"</span>,\n",
       "    <span style=\"color: #808000; text-decoration-color: #808000\">reask</span>=<span style=\"color: #800080; text-decoration-color: #800080; font-style: italic\">None</span>,\n",
       "    <span style=\"color: #808000; text-decoration-color: #808000\">validation_passed</span>=<span style=\"color: #00ff00; text-decoration-color: #00ff00; font-style: italic\">True</span>,\n",
       "    <span style=\"color: #808000; text-decoration-color: #808000\">error</span>=<span style=\"color: #800080; text-decoration-color: #800080; font-style: italic\">None</span>\n",
       "<span style=\"font-weight: bold\">)</span>\n",
       "</pre>\n"
      ],
      "text/plain": [
       "\u001b[1;35mValidationOutcome\u001b[0m\u001b[1m(\u001b[0m\n",
       "    \u001b[33mraw_llm_output\u001b[0m=\u001b[32m\"My\u001b[0m\u001b[32m ITIN is 923756789 and my driver's license number is 87651239\"\u001b[0m,\n",
       "    \u001b[33mvalidated_output\u001b[0m=\u001b[32m\"My\u001b[0m\u001b[32m ITIN is \u001b[0m\u001b[32m<\u001b[0m\u001b[32mUS_ITIN\u001b[0m\u001b[32m> and my driver's license number is <US_DRIVER_LICENSE\u001b[0m\u001b[32m>\u001b[0m\u001b[32m\"\u001b[0m,\n",
       "    \u001b[33mreask\u001b[0m=\u001b[3;35mNone\u001b[0m,\n",
       "    \u001b[33mvalidation_passed\u001b[0m=\u001b[3;92mTrue\u001b[0m,\n",
       "    \u001b[33merror\u001b[0m=\u001b[3;35mNone\u001b[0m\n",
       "\u001b[1m)\u001b[0m\n"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    }
   ],
   "source": [
    "# Another example\n",
    "text = \"My ITIN is 923756789 and my driver's license number is 87651239\"\n",
    "\n",
    "output = guard.parse(\n",
    "    llm_output=text,\n",
    "    metadata={\"pii_entities\": [\"US_ITIN\", \"US_DRIVER_LICENSE\"]},\n",
    ")\n",
    "\n",
    "# Print the output\n",
    "print(output)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### In this way, any PII entity that you want to check for can be passed in through the metadata and masked by Guardrails for your LLM outputs. Of-course, like all other examples, you can integrate this into your own code and workflows through the complete Guard execution.\n"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "guard-venv",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.10.0"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 2
}
